Predictive Analytics

Column

Histogram of customers’ salary: Most of the customers paid by RM1000-1200

Relationship between salary and age: Salary decreased by age

Column

Relationship between salary and gender : Male get paid more than female

Relationship between salary and day : Most of the customers get paid on Monday

1:Monday, 2:Tuesday, 3:Wednesday, 4:Thursday, 5:Friday, 6:Saturday, 7:Sunday

Multiple Linear Regression Model

Column

Model

Since the p-value for gender and age are small, so the test is significant that gender and age are significantly effect the amount of salary.


Call:
lm(formula = amount ~ gender + age + Weekday, data = data1)

Residuals:
    Min      1Q  Median      3Q     Max 
-1361.6  -756.4  -204.2   480.0  6858.2 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 2085.767    130.966  15.926  < 2e-16 ***
genderM      426.652     75.626   5.642 2.27e-08 ***
age          -13.657      3.087  -4.424 1.09e-05 ***
Weekday       11.595     25.601   0.453    0.651    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1121 on 879 degrees of freedom
Multiple R-squared:  0.05394,   Adjusted R-squared:  0.05071 
F-statistic: 16.71 on 3 and 879 DF,  p-value: 1.461e-10

Column

Residuals diagnostic(Autocorrelation test)

Residuals diagnostic(Normality test)


    Shapiro-Wilk normality test

data:  res
W = 0.85767, p-value < 2.2e-16

The residuals were uncorrelated but not normally distributed. The modification of model is needed.